Systems and methods for cache memory authentication

ABSTRACT

Apparatus for authentication of system memory content, the apparatus operative in conjunction with system memory storing content (code or non-code) utilized by a CPU, the apparatus comprising cache memory and an authentication module configured to use free time over a memory interface defined between the system memory and the cache memory, for authentication of content stored in the system memory, including use of at least one signature stored on the system memory to authenticate the content.

FIELD OF THIS DISCLOSURE

The present invention relates generally to authentication and more particularly to authentication in accordance with given security policies.

BACKGROUND FOR THIS DISCLOSURE

In computer/processor architecture, cache memory is often used to accelerate code fetch and data read performance.

Co-pending published U.S. Application US2015263852 describes systems and methods for secure storage on external memory.

The disclosures of all publications and patent documents mentioned in the specification, and of the publications and patent documents cited therein directly or indirectly, are hereby incorporated by reference. Materiality of such publications and patent documents to patentability is not conceded.

SUMMARY OF CERTAIN EMBODIMENTS

Certain embodiments seek to provide on the fly continuous strong authentication of code or of data; the term “content” may be used herein to refer generally to a digital representation of either code or data. Typically, a signature is generated e.g. using SHA1/2, cyclic redundancy check (CRC), Galois field multipliers or any other suitable technology, and stored, for each memory range holding content (code or data). The signature is then used for authentication of that content. Encryption may be used to prevent duplication of the signature.

Authentication may also take place during sleep time or during power-up or boot, although during power-up, in a security system, the entire content is normally authenticated.

Strength of authentication is related to signature length. During initiation, a single signature may be stored for the entire content, so without incurring much overhead the single signature can be longer, hence stronger. However, for short sections, weaker (shorter) signatures may be used to avoid undue overhead.

When cache memory is used to accelerate code fetch and data read performance, windows of free time may become available on the memory interface, interspersed between periods of time in which the memory interface is occupied with transactions generated by the processor.

Certain embodiments seek to take advantage of free time available on cache-able memory interfaces, in order to read and authenticate memory content such as previously signed code or other previously signed data.

Certain embodiments seek to receive a cache module read address, e.g. from a processor (such as CPU) which generated that address, and to use the cache module read address to predict the address of upcoming fetch operations. Then, upcoming fetch operations are performed from an address space which may be located in a (typically external) system memory and which

a. was recently authenticated

And/or (dependent on policy/configuration/security profile)

b. will be authenticated soon. In some cases high priority authentication is to be performed after a first fetch from a range but before execution to ensure execution of authenticated code. For example authentication of high priority code that was just executed may be forced, even if this postpones authentication of some other code.

“Recent” authentication may be defined as authentication that occurred within a certain period of time from the previous authentication, and/or may be defined by any other suitable, use-case specific threshold and/or authentication may be deemed “recent” using a relative criterion e.g. authentication which occurred after a certain code was also authenticated, may be deemed “recent”.

Certain embodiments seek to provide a method for authenticating code already loaded to a cache.

Certain embodiments seek to provide a method for loading code to cache for authentication before the code is used by the processor.

Certain embodiments seek to enhance security of code execution; instead of or in addition to authenticating code at some undefined time before execution thereof e.g. when booting, code is authenticated, according to certain embodiments, just before code execution, e.g. as the code enters the cache, typically utilizing free time available during those time periods in which the memory interface with the cache would otherwise be idle.

Certain embodiments seek to provide an improved apparatus and method for authenticating content, in which at least a portion of the content is authenticated after the content has been stored on a cache on a main die, rather than authenticating all content while it still resides on a system memory external to the main die.

Certain embodiments seek to perform authentication just before execution to prevent too long a time window between authentication that might have been performed, say, upon installation/update) and execution (or other use) of content e.g. code or data. Typically, the policy is stored on external memory e.g. after or before the signature of each range, and is authenticated before execution, e.g. X seconds before execution e.g. typically no use of data is available on the system memory interface when fetched by the CPU, hence there is no use of free time. Thus, each authentication range may or may not have a separate authentication policy and these policy/ies may be associated with range/s via existing system hardware e.g. MPU or by storing a policy, encrypted, on the system's memory.

Certain embodiments seek to provide a method including installation or update of code/content in system memory including, optionally, conventional authentication of code/content; splitting of code/content into ranges; generating signatures for the ranges and storing the signatures, and optionally also policy/ies for the ranges, storing association/correspondence between range/s and their signature/s and policy/ies, e.g. in system memory, and execution of code/use of data. Execution typically includes reading content, e.g. code/data into cache, cache line after cache line, and authenticating the content during free time as per each chunk of content's authentication policy, e.g. by comparing signature/s computed from the content, to corresponding stored signature/s, that might be encrypted, executing (using) each chunk, if authenticated, and otherwise perform pre-defined authentication-has-failed procedure e.g. output a warning and await external input before executing.

Certain embodiments herein are particularly suited for “critical” code such as code that sends sensitive data, out of the device or silicon die.

It is appreciated that a cache is inaccessible from the outside, hence it is more secure, as opposed to external memory, which which has an accessible interface. The signature may be stored on the external memory, typically encrypted; it is generally impractical to change data and generate a correct encrypted signature without knowing a secret.

The present invention typically includes at least the following embodiments:

Embodiment 1

Apparatus for authentication of system memory content, the apparatus operative in conjunction with system memory storing content (code or non-code) utilized by a CPU, the apparatus comprising:

-   -   cache memory; and     -   an authentication module configured to use free time over a         memory interface defined between the system memory and the cache         memory, for authentication of content stored in the system         memory, including use of at least one signature stored on the         system memory to authenticate the content.

The system memory may be internal or external and the content stored therein may comprise code or data (non-code) or both.

For example, the authentication module may include logic which schedules authentication according to one or more considerations, one consideration being that all other things being equal, or to the extent possible, authentication is best performed during periods/windows of free time over the memory interface.

The authentication module may use the free time for authentication of content stored in, or fetched to cache memory from, the system memory. Typically, content being authenticated is on the system memory, unless the policy to authenticate code located on the cache is enabled and used. However, content may be authenticated from external memory.

It is appreciated that the signature may be copied into an internal memory, in which case the authentication module may read the signature from the internal memory.

Storage of at least one signature on the system memory is more cost effective than storage of signature/s on internal memory e.g. because there is no need to provide a dedicated memory to store signatures.

Embodiment 2

Apparatus according to any of the preceding embodiments, wherein when the authentication module authenticates content, the authentication module reads the content directly from the system (external) memory.

Embodiment 3

Apparatus of any of the preceding embodiments where the signature is stored encrypted, rather than as plain text, in system memory, thereby to deter an attacker from replacing code so as to forge the signature.

If the signature is stored on a system memory and the signature mechanism is or becomes known, an attacker or hacker can easily replace the code and forge the signature.

Embodiment 4

Apparatus according to any of the preceding embodiments where the content stored in system memory is also encrypted.

It is appreciated that the same encryption scheme may be used for the data and for the signature.

It is appreciated that for RSA signature schemes (say) there is no need to encrypt the signature, since no one other than the private key holder can reproduce the signature. When using RSA signatures, a digest operation implemented e.g. using SHA1/2 algorithms is followed by the signature operation. However, when using signature schemes other than RSA, it is possible to implement a digest of the data for example using SHA1/2 or cyclic redundancy check (CRC) or Galois field multiplication, but then the signature might be computed by an attacker. To prevent any scenario of computing the signature, the signature may be encrypted. If the data stored by the signature is also encrypted the protection level further rises.

Embodiment 5

Apparatus according to any of the preceding embodiments, which supports plural authentication ranges each signed separately using separate signatures each having specific storage locations respectively, and wherein signature data context is stored, including a pointer to the address most recently accessed, for at least one range whose authentication was interrupted.

Embodiment 6

A apparatus according to any of the preceding embodiments wherein the authentication module captures content fetched by the CPU from the cache memory and performs authentication on the content.

Embodiment 7

Apparatus according to any of the preceding embodiments, wherein the authentication module captures content fetched by the CPU from the system memory and performs authentication on the content. Typically, data fetched by the CPU is used during “memory interface busy” time rather than during free time.

A particular advantage of the embodiments 6, 7 is that capturing this data reduces time the authentication engine requires for reading the data. The data fetched by the processor is used by the CPU and is also authenticated on the fly.

Embodiment 8

Apparatus according to any of the preceding embodiments, wherein the authentication module performs at least one authentication operation when the CPU is idle.

Embodiment 9

Apparatus according to any of the preceding embodiments, wherein the authentication module receives at least one security indication from a system MPU, and acts upon the at least one security indication as determined by a pre-stored security policy.

Embodiment 10

Apparatus according to any of the preceding embodiments, where the security policy is selectably determined, per code functionality, for the authenticated range.

Embodiment 11

Apparatus according to any of the preceding embodiments, where at least a security level priority is selectably set, separately for each of a plurality of groups of ranges, each group comprising at least one range.

Embodiment 12

Apparatus according to any of the preceding embodiments, where the authentication module predicts content about to be fetched to the cache and then performs authentication of an address range known to be associated with the content.

Embodiment 13

Apparatus according to any of the preceding embodiments, where the authentication module, when reading memory, shares cache logic used by the CPU for memory read.

Embodiment 14

Apparatus according to any of the preceding embodiments wherein the content comprises code representing at least a portion of a procedure to be executed.

Embodiment 15

Apparatus according to any of the preceding embodiments wherein the authentication module is configured to perform authentication context switching.

Embodiment 16

Apparatus according to any of the preceding embodiments wherein an authentication address context is used to hold a range of at least one addresses, which is compared by compare logic with an address on an Address-In interface and wherein, if an address match occurs, higher authentication priority is assigned to the range whereas the compare logic does not indicate an address match, an authentication engine uses a CPU HIT duration to access the system memory thereby to read data therefrom for authentication.

Embodiment 17

Apparatus according to any of the preceding embodiments wherein the procedure includes at least one high-criticality portion and at least one low-criticality portion and wherein the authentication ranges include at least pre-defined first and second authentication ranges such that the at least one high-criticality portion of the procedure is incorporated into the first authentication range and that the at least one low-criticality portion of the procedure is incorporated into the second authentication range, thereby to enable the authentication module to apply a first more stringent authentication policy to the first authentication range and a second less stringent authentication policy to the second authentication range.

Embodiment 18

Apparatus according to any of the preceding embodiments wherein the procedure includes at least one frequently-executed portion and at least one infrequently-executed portion and wherein the authentication ranges include at least pre-defined first and second authentication ranges such that the at least one frequently-executed portion of the procedure is incorporated into the first authentication range and that the at least one infrequently-executed portion of the procedure is incorporated into the second authentication range, thereby to enable the authentication module to authenticate the first authentication range more frequently than the second authentication range.

Embodiment 19

Apparatus according to any of the preceding embodiments wherein the authentication comprises on-the-fly authentication.

Embodiment 20

Apparatus according to any of the preceding embodiments, wherein the authentication module performs authentication operations only when the CPU is idle.

Embodiment 21

Apparatus according to any of the preceding embodiments wherein the authentication module knows that the CPU is idle due to a system-provided signal indicating idle state.

It is appreciated that when the CPU is idle it may not execute code, hence CPU idle state may be indicative of free time availability.

Embodiment 22

Apparatus according to any of the preceding embodiments wherein the system-provided signal indicates at least one of the following idle states: sleep state, wait loop, clock stop.

Embodiment 23

Apparatus according to any of the preceding embodiments wherein the security policy stipulates that code used for internal operation/s has a lower authentication priority, relative to code used for input from or output to an external device which enjoys a higher authentication priority.

Embodiment 24

Apparatus according to any of the preceding embodiments wherein the security policy stipulates that authentication of code must be performed before execution thereof.

Embodiment 25

Apparatus according to any of the preceding embodiments, at least one group comprising a plurality of ranges.

Embodiment 26

Apparatus according to any of the preceding embodiments wherein the system memory is external to at least one die on which at least the CPU and cache reside.

Embodiment 27

Apparatus according to any of the preceding embodiments wherein at least the CPU and cache, but not the system memory, are packed together in a single package.

Embodiment 28

Apparatus according to any of the preceding embodiments wherein the authentication module comprises a purely hardware module.

Embodiment 29

Apparatus according to any of the preceding embodiments, and also comprising a system MPU configured to send at least one security indication to the authentication module to be acted upon in accordance with a security-indication dependent security policy.

Embodiment 30

Apparatus according to any of the preceding embodiments wherein the CPU and authentication module reside on a CPU die, and the system memory is external to the CPU die.

Embodiment 31

Apparatus according to any of the preceding embodiments wherein the authentication module comprises a processor.

Embodiment 32

Apparatus according to any of the preceding embodiments wherein the authentication module includes logic configured at least once to schedule content authentication over a time period T selected by the logic to coincide at least partly with a window of free time over the memory interface.

Embodiment 33

Apparatus according to any of the preceding embodiments wherein the authentication module includes logic operative, if possible, to schedule content authentication over a time period T selected by the logic to coincide at least partly with a window of free time over the memory interface.

Embodiment 34

Apparatus according to any of the preceding embodiments wherein the logic is configured to schedule content authentication over a time period T selected by the logic to be entirely included, if possible, within the window of free time over the memory interface.

Embodiment 35

A method for authentication of system memory content which is operative in conjunction with system memory storing content utilized by a CPU, the method comprising: providing a cache memory; and employing an authentication module during free time over a memory interface defined between the system memory and the cache memory, for authentication of content stored in the system memory.

Embodiment 36

A method according to any of the preceding embodiments wherein the content is stored, for utilization by a CPU residing on a CPU die, on system memory external to the CPU die, the employing comprising:

storing, on the system memory,

-   -   the content, in a plurality of ranges each including plural         cache lines; and     -   a signature set including at least one signature and a policy         set including at least one authentication policy, thereby to         define an additional, “security” range within the system memory         which holds the signature and policy sets;     -   wherein each of the plurality of ranges is associated with a         signature in the signature set and with an authentication policy         in the policy set; and

providing a hardware authentication module on the CPU die which is configured to authenticate the content by accessing the content, range by range, including verifying at least one individual range's signature according to the individual range's policy and wherein the accessing occurs upon at least one occasion defined by the range's policy.

A particular advantage of this embodiment is that typically, not only the signature, but also the policy, is located in the external memory.

A particular advantage of using ranges with plural cache lines rather than authenticating one cache line at a time, is efficiency; the amount of memory required for authentication is smaller and the execution time is shorter since there is no need to read a separate signature for each cache line.

Each of N ranges may be associated with one of N signatures in the signature set.

Example: if there are 3 ranges the security range may contain: low address, high address, signature and policy and/or security level (may be part of the policy) for range1, ditto for range2, ditto for range3. Typically, each policy/signature in the security range is stored adjacent to “matching” high-low address or “matching” address-size of the range to which the policy/signature pertains.

Embodiment 37

A method according to any of the preceding embodiments wherein the hardware authentication module is configured to authenticate the content by accessing the content, range by range, including verifying at least one individual range's signature according to the individual range's security level which is stored as part of the individual range's policy.

Embodiment 38

A method according to any of the preceding embodiments wherein at least one individual range's policy is signed by the individual range's signature, thereby to prevent unauthorized policy change.

Embodiment 39

A method according to any of the preceding embodiments wherein at least one individual range's policy is signed by a signature other than the individual range's signature, thereby to prevent unauthorized policy change.

Embodiment 40

Apparatus according to any of the preceding embodiments where each authentication range is stored in association with at least one predefined range address.

Embodiment 41

Apparatus according to any of the preceding embodiments wherein, when the CPU performs at least one of instruction fetch and data read, a fourth switch on the cache logic is set to state 1 and wherein, if the address matches one of the cache memory's line addresses, a cache HIT indication is generated, a first switch on the cache logic is at state 1, a second switch on the cache logic is at state 0, and a third switch on the cache logic is at state 0, and the CPU receives data directly from the cache, thereby to enable the authentication module to perform data read operations directly from the system memory to authenticate an address range.

Embodiment 42

Apparatus according to any of the preceding embodiments wherein the authentication module identifies, on an address input line, an address that matches an authenticated address, wherein, in the event of a HIT, the first and second switches are set to state 1 and the data from the cache memory reaches the authentication module for use in authentication and wherein otherwise, in case of a cache MISS in which an address of data read by the CPU matches an authenticated address, the first switch is set to state 0 and the authentication module loads the data for authentication.

Embodiment 43

Apparatus according to any of the preceding embodiments and wherein, when the CPU is idle, the third and fourth switches are set to 0 to allow the authentication module to use data recently loaded to the cache memory and wherein, for a cache HIT on the authenticated address, the first and second switches are set to state 1 by the authentication module to use data from the cache for authentication, whereas, for a cache MISS, switches 1 and 2 are set to state 0 to allow the authentication module to use data from the system memory for authentication.

Embodiment 44

Apparatus according to any of the preceding embodiments wherein a cache control signal is used to control bus peripheral signals when the authentication module performs a read operation.

Embodiment 45

Apparatus according to any of the preceding embodiments wherein a cache control signal is used to control the second, third and fourth switches on the cache logic.

Embodiment 46

Apparatus according to any of the preceding embodiments wherein cache status signalling provided by cache logic to the authentication modules indicates at least one of: cache HIT/cache MISS; and a CPU idle state.

Embodiment 47

Apparatus according to any of the preceding embodiments and wherein occurrence of multiple cache hits indicates that free time is available during cache HITs.

Embodiment 48

Apparatus according to any of the preceding embodiments wherein the authentication module includes an authentication address context memory including a group of at least one register/s configured for storing next address for authentication for each of plural authentication ranges.

Embodiment 49

Apparatus according to any of the preceding embodiments wherein an authentication control selects an address context value to be driven on data-out by controlling a Multiplexer.

Embodiment 50

Apparatus according to any of the preceding embodiments wherein the authentication control is configured to load an initial address of a range to be authenticated and to increment an authentication address context after authentication data read.

Embodiment 51

Apparatus according to any of the preceding embodiments and wherein, once authentication data is present on DATA-in lines, a Load Data signal loads the authentication data into an Authentication engine having at least one of the following functionalities: SHA1, SHA2, cyclic redundancy check (CRC), GFx field multiplier.

Embodiment 52

Apparatus according to any of the preceding embodiments wherein the processor is configured to run code configured to use the free time for authentication of the content, and wherein the code is at least partly stored in a ROM.

Embodiment 53

Apparatus according to any of the preceding embodiments wherein the processor is configured to run code configured to use the free time for authentication of the content, and wherein the code is at least partly stored in a RAM and is downloaded and authenticated before being run by the processor.

Embodiment 54

Apparatus according to any of the preceding embodiments wherein the authentication module requests a data-read using at least one cache control signal including: at least one switch control signal; and at least one system bus control signal and wherein a resulting read address is driven through an address-out interface.

Embodiment 55

A method according to any of the preceding embodiments wherein the plurality of authentication ranges includes at least first and second authentication ranges respectively including different numbers of cache lines.

Embodiment 56

Apparatus according to any of the preceding embodiments, where each authentication range is stored in association with the range's size.

Embodiment 57

Apparatus according to any of the preceding embodiments wherein the processor is configured to run code configured to use the free time for authentication of the content, and wherein the code is at least partly stored in an embedded flash.

Embodiment 58

A method according to any of the preceding embodiments which operates on a cacheable system including a cache memory and cache logic.

Embodiment 59

A method according to any of the preceding embodiments, where each authentication range is stored in association with the range's size.

A computer program product, comprising a non-transitory tangible computer readable medium having computer readable program code embodied therein, said computer readable program code adapted to be executed to implement any method shown and described herein.

Also provided, excluding signals, is a computer program comprising computer program code means for performing any of the methods shown and described herein when said program is run on at least one computer; and a computer program product, comprising a typically non-transitory computer-usable or -readable medium e.g. non-transitory computer-usable or -readable storage medium, typically tangible, having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement any or all of the methods shown and described herein. The operations in accordance with the teachings herein may be performed by at least one computer specially constructed for the desired purposes or general purpose computer specially configured for the desired purpose by at least one computer program stored in a typically non-transitory computer readable storage medium. The term “non-transitory” is used herein to exclude transitory, propagating signals or waves, but to otherwise include any volatile or non-volatile computer memory technology suitable to the application.

The term processor includes a single processing unit or a plurality of distributed or remote such units. The above devices may communicate via any conventional communication means|_([IMA1]).

The apparatus of the present invention may include, according to certain embodiments of the invention, machine readable memory containing or otherwise storing a program of instructions which, when executed by the machine, implements some or all of the apparatus, methods, features and functionalities of the invention shown and described herein.

Any trademark occurring in the text or drawings is the property of its owner and occurs herein merely to explain or illustrate one example of how an embodiment of the invention may be implemented.

Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions, utilizing terms such as, “processing”, “computing”, “estimating”, “selecting”, “ranking”, “grading”, “calculating”, “determining”, “generating”, “reassessing”, “classifying”, “generating”, “producing”, “stereo-matching”, “registering”, “detecting”, “associating”, “superimposing”, “obtaining” or the like, refer to the action and/or processes of at least one computer/s or computing system/s, or processor/s or similar electronic computing device/s, that manipulate and/or transform data represented as physical, such as electronic, quantities within the computing system's registers and/or memories, into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices. The term “computer” should be broadly construed to cover any kind of electronic device with data processing capabilities, including, by way of non-limiting example, embedded cores, personal computers, computing system, communication devices, processors (e.g. digital signal processor (DSP), microcontrollers, field programmable gate array (FPGA), application specific integrated circuit (ASIC), etc.) and other electronic computing devices.

Elements separately listed herein need not be distinct components and alternatively may be the same structure. A statement that an element or feature may exist is intended to include (a) embodiments in which the element or feature exists; (b) embodiments in which the element or feature does not exist; and (c) embodiments in which the element or feature exist selectably e.g. a user may configure or select whether the element or feature does or does not exist.

Any suitable processor/s may be employed to compute or generate information as described herein and/or to perform functionalities described herein and/or to implement any engine, interface or other system described herein. Any suitable computerized data storage e.g. computer memory may be used to store information received by or generated by the systems shown and described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

Certain embodiments of the present invention are illustrated in the following drawings:

FIG. 1 is a prior art diagram of a computing system including a processor and an associated system memory and cache.

FIG. 2 is a simplified block diagram illustration of a computing apparatus including a processor and an associated system memory and cache, constructed and operative in accordance with an embodiment of the present invention and including an authentication module typically operative on the fly.

FIG. 3 is a prior art diagram of a cache memory system without authentication, wherein the cache is typically fully associative.

FIG. 4 is a simplified block diagram illustration of an apparatus with a fully associative cache, where the authentication unit may read data from the external memory while the processor is fetching code or data from the cache, and may also use data already stored in cache memory or capture data fetched by the CPU from the main memory to enhance authentication speed e.g. for on-the-fly authentication.

FIG. 5 is a simplified block diagram illustration of an authentication module which may for example be used to implement the authentication module 700 of FIGS. 2, 4 which typically performs on-the-fly authentication.

FIG. 6 is an example of an events table useful for certain embodiments e.g. when it is desired to receive a cache module read address, e.g. from a processor (such as CPU) which generated that address, and to use the cache module read address to predict the address of upcoming fetch operations. The table of FIG. 6 may include only some of the rows, columns and cells specifically illustrated.

FIGS. 7-8 are simplified flowchart illustrations of methods provided in accordance with certain embodiments; these methods may comprise some or all of the illustrated operations, suitably ordered e.g. as shown.

Methods and systems included in the scope of the present invention may include some (e.g. any suitable subset) or all of the functional blocks shown in the specifically illustrated implementations by way of example, in any suitable order e.g. as shown.

Computational components described and illustrated herein can be implemented in various forms, for example, as hardware circuits such as but not limited to custom VLSI circuits or gate arrays or programmable hardware devices such as but not limited to FPGAs, or as software program code stored on at least one tangible or intangible computer readable medium and executable by at least one processor, or any suitable combination thereof. A specific functional component may be formed by one particular sequence of software code, or by a plurality of such, which collectively act or behave or act as described herein with reference to the functional component in question. For example, the component may be distributed over several code sequences such as but not limited to objects, procedures, functions, routines and programs and may originate from several computer files which typically operate synergistically.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

The following terms may be construed either in accordance with any definition thereof appearing in the prior art literature or in accordance with the specification, or as follows:

-   -   Authentication: intended to include using a signature to verify         code or data.     -   Content: intended to include either code or data     -   Data: As is apparent from the text, this term is used in 2         senses—either a. as opposed to an address e.g. Data bus vs.         address bus; or b. as opposed to code e.g. instructions/data in         FIG. 1 or code vs. data in the embodiment of FIG. 4.     -   Signature: intended to include information generated by         performing a computational process on code or data, e.g. using a         cyclic redundancy check (CRC), Galois field multiplication, or         hash function; private key or other encryption may optionally be         used in addition.     -   Authentication ranges: intended to include memory ranges,         typically limited by upper and lower addresses, each storing         content, also termed herein a “chunk” (e.g. data, or (at         least a) portion of code having a (at least one) functionality)         requiring authentication, and each being associated via a         predetermined chunk-signature association scheme, with a         signature stored at a specific space/location in memory. The         scheme may be fixed, for example range #5 is always associated         with signature #5. The scheme may be stored on the external         memory and loaded into the authentication engine.

Range, aka memory range: intended to include a continuous memory space, e.g. sequence of consecutively addressed cells in system memory. A range may then be identified, say, by a low address and a high address or by a low (or high) address and size. A range may include one or more Cache lines, given that data is typically transferred between system memory and cache in blocks of fixed size, called cache lines.

free time: The term “free time” is intended to include window/s of time in which a memory interface, which may comprise a main memory bus, is free of CPU code fetch and/or data read/write operations (e.g. has no such operations pending which need to be performed) such that the authentication process described herein may, within such window/s, read data from the system memory over the main memory bus e.g. so as to authenticate that data while minimizing detriment to system efficiency. The more data authentication can be made to occur within the windows of free time, the lower the penalty, in terms of performance, which needs to be paid for the additional security provided thereby.

On the Fly authentication or “during-operation authentication”: intended to include an authentication process, at least a portion of which is performed at a time which is between an n'th Proceessor access to main memory to an (n+1)th processor access to the main memory may be termed On the Fly authentication. For example, if the content is code, at least a portion of the content may be at least partly authenticated after (perhaps minutes or seconds after) the code was last authenticated and just before (e.g. milliseconds, seconds or minutes before) the code is executed or even while the code is being executed. For example, a process represented by 100 chunks of code may be executed chunk by chunk, and chunk x may be authenticated while chunk x+1 is being executed. If at least part of an authentication process occurs during free time, the authentication process is deemed to have been performed on the fly. Typically, content is authenticated then used as soon as it is authenticated; e.g. code may be run as soon as authentication thereof has been completed.

Security level: MPU (memory protection unit 800) may define a different security level per address range. In conventional systems, different applications may have different authorizations, or levels of authorization, to make memory changes, to operate certain code or to access certain data. This differential authorization may be managed by a memory controller, operative for enabling and disabling access into the memory during operation of certain code and then blocking this access. The security level (which may comprise a priority level) of certain areas may be outputted from the MPU and used to determine a security policy.

Security policy: for each level a policy may be defined which may for example stipulate one, some or all of: A, time between last authentication to fetch operation; B. always authenticate before usage? Yes/no; C. use data fetch for authentication—yes/no; D. if fetch give high priority to authentication or if fetch give lower authentication priority?

security indication: intended to include security level priority or security level configuration.

security level configuration: intended to include authentication's priority level per address range or per security indication type from MPU. For example, authentication of lower priority code might be interrupted after first preserving context, in deference to a higher priority code now running. Then, after authentication of the high priority code has been completed, authentication of the lower priority code may resume from where it left off, as indicated by the preserved context.

security level priority: intended to include a parameter that may be used to represent priority for authentication of one range relative to another, when more than one range is to be authenticated at a given time.

security indication type: intended to include either security level priority or security level configuration.

system memory: intended to include one or more computer storage devices serving a CPU which are typically located outside of the CPU die.

cache: intended to include one or more computer storage devices which serve the CPU and are packed together with the CPU or located on the same silicon die as the CPU, hence are more secure than the system memory. The cache device/s are smaller than the system memory and either reside on the CPU or are closer to (more quickly accessible by) the CPU than the system memory is. The cache expedites CPU operation by storing portions of the content that the system memory stores.

External (to x): intended to include a component e.g. memory which does not reside on the same die as, and/or is not packed together in a single package as, component x. For example, in conventional PCs the memory is external to the CPU die to allow memory to be added in contrast to the CPU die in which no change can be made. For example, a component stored in or pertaining to system memory rather than being stored in or pertaining to the CPU's cache is deemed external. So content e.g. code is generally stored externally, other than the code (say) currently in the cache which, currently, is (also) stored internally.

Authentication context: intended to include the internal state of an authentication engine specifically, rather than the internal state of a processor generally. Any suitable information may be stored to sufficiently represent the internal state, such as but not limited to the value of the crc computed for the data processed for zone x prior to the authentication range switching from x to y, and/or the sha1/2 value of the data processed thus far.

CPU: may include multiple cores, each implemented as an independent die or all being etched on the same die.

die: intended to include a single continuous piece of semiconductor material e.g. silicon; may include plural cores. Or, one processor may include plural dies.

Systems and methods which utilize windows of free time for ongoing performance of authentication especially during run time, are now described with reference to FIGS. 1-2, and 4 onwards.

FIG. 1 is a simplified block diagram of a standard processor system with cache memory. As shown, the system memory (100) may be internal (e.g. located on the processor (CPU's) silicon die) or, more commonly, may be external (located on another, “external”, die). In FIG. 2, the die, also termed herein “the main die”, is indicated by a dotted line. The interface 200 may for example comprise a DDR interface, an SPI interface, or a conventional parallel interface. The interface (200) may include a decryption engine used during content (data or code) read operations or when an encrypted signature is being read. An encryption engine may be provided e.g. if data is being written into external memory.

FIG. 1 shows the Processor 400 connected via the cache memory interface (300) to the memory interface (200).

In other embodiments the CPU (400) may be connected both to the memory interface 200 and to the instruction cache 300.

Conventionally, a cache is used by the central processing unit (CPU) of a computer or of an embedded system to reduce the average time required to access data from the system memory. The cache is smaller, faster and is typically implemented on die memory which stores copies of content, e.g. data and/or code from frequently used main memory locations. CPUs may have separate and independent caches for code and for data. Embodiments of the present invention may be implemented for instruction caches as well as for data caches; typically, however, each authenticated code or data range has a corresponding signature implemented on the same memory or on a different memory e.g. the signature may be stored on the memory that stores the range or on another memory or even on an internal memory which is smaller than the system memory used for code and data storage.

FIG. 3 is a prior art block diagram of a fully associative cache with cache lines. One conventional way of relating cached data to the main memory address is to store both memory address and data together in the cache i.e. the fully associative mapping approach. A fully associative cache includes associative memory storing both the memory address and the data for each cached line.

Any incoming memory address may be simultaneously compared with all stored addresses using the internal logic of the associative memory. If a match (HIT) is found, the corresponding data is read out. Otherwise data is read from the system memory; this data may also be updated in the cache itself. It is appreciated that storing the entire address and implementing an address comparison mechanism for each data word is very costly. Normally, each cache line includes more than 1 byte or one word, for example cache lines may store 16 or 32 data bytes each. Since, in most cases during normal execution, the instruction address is incremented continuously, this approach increases the storing efficiency of the cache address and reduces comparison logic cost with very little performance loss.

FIG. 2 is a simplified high level block diagram of a processor apparatus including a cache and an authentication module for cached content (code or other data). Typically, on the fly authentication of external memory space is performed on each of multiple memory address ranges (“authentication ranges”) (e.g. 102, 102 in FIG. 4), for each of which a signature is generated (e.g. using a hash function on the data stored in the memory range) and stored on the external memory. Alternatively, the signatures may be stored in an internal memory.

A signature may be computed for plural ranges in parallel, e.g. during operation. A signature algorithm may be implemented using Galois field multiplication, modulo based multiplication or even checksum; the read order inside a range then has no effect on the signature result. If SHA1 or SHA2 algorithms are used for signature generation, the read order of data is typically fixed.

To perform authentication, a signature is normally generated by feeding a stream of data into a hash function. Nonetheless, for any of the embodiments described above, it is possible to flit between authentication ranges when authentication is in progress. Thus, authentication of one range may be interrupted, in order to flit over to authentication of another range while authentication of the first range is still in progress i.e. still unfinished, and then, perhaps, flit back again to continue authenticating the first range, and so forth, until each range is eventually fully authenticated. To do this, the engine internal state (representing the current, unfinished state of signature computation for authentication of the first range x) may be stored before flitting from the unfinished authentication of range x to authentication of range y. Then, the authentication operation of a different section of memory, e.g. range y, is embarked upon and perhaps even concluded. Next, the internal state of the previous stream of data (e.g. of range x) is loaded and authentication of range x may then continue from the point the previous signature computation ceased.

For example, a need may arise to speed up execution of code, stored in a high priority range, that must be authenticated shortly before execution. Unfortunately, authentication of a lower priority range is ongoing. In this case, the ongoing authentication of the lower priority may be halted, the authentication context relevant thereto may be stored, and authentication of the higher priority range code that is about to run, is initiated.

Typically, then, before each flit from range x to range y (or vice versa), an entire “signature context” (aka authentication context) is stored for range x (or y) so as to be able to switch (flit) back to range x eventually. Typically, the authentication context stored comprises:

-   -   a. Address Context (aka authentication address context, context         address): the address of the last data that was authenticated         e.g. the next address for authentication for each authentication         range; and/or     -   b. Data Context (aka signature data context, data signature         context): the internal state of the authentication engine just         before flitting.

Any conventional context switch or interrupt handling techniques may be used to facilitate the above embodiment.

Signature verification is done following the read of the last byte in a range. Signature computation may be reproduced when using conventional Diagest algorithms (such as but not limited to cyclic redundancy check (CRC), SHA1 or 2, Galois field multipliers), and it is possible to duplicate the computed signature for a data change. Therefore, if the signature might be accessed by an attacker, the signature is typically encrypted using a secret key in order to protect the signature from being modified by an attacker.

The Authentication module 700 takes advantage of free time over the memory interface and uses data already existing in the cache; Authentication module 700 typically uses data fetched by the CPU 400 from the memory 100 for authentication, typically on the fly, and uses a cache 600, to read data from the external memory 100 with minimal performance loss for the CPU 400.

Cache 600, e.g. an instruction cache, may include memory and logic e.g. as shown in FIG. 4. The terms “instruction's” and “code” are used herein generally interchangeably. Cache 600 may comprise existing cache hardware.

Alternatively or in addition, when the CPU activity level is low, the authentication logic 700 may take advantage of the cache memory space in order to speed up authentication when the CPU 400 is idle.

The authentication module 700 may also enforce a security policy, e.g. according to protection level (aka security level) input from a memory protection unit (mpu). The authentication module may receive security indication's:

-   -   i. from a system MPU, and act thereupon as determined by a         pre-stored security policy e.g. per region having a particular         priority level; and/or     -   ii. according to a pre-configured setting of an authentication         range, of the device itself and the address input into the cache         device.

For example, code used for system output may be located at a certain range having a higher than average security level setting. This security level might be determined at least partly by a configuration of a specific range in the authentication unit and/or might be located in memory and read by the authentication unit before the first read by a processor from a range and/or might be determined at least partly by an input from the authentication unit. This security level may enforce high priority authentication e.g. the apparatus might stop (interrupt) the authentication of a current range now being authenticated, which has a lower security level and therefore has a lower priority, store the context of the current authentication process and immediately initiate (flit to) authentication of the high-security range in which the output code resides.

Alternatively, another policy might enforce an authentication time-out. For example, the policy might stipulate that if code to be executed was not authenticated during the last second (1 sec) before the output code runs, the code to be executed must be authenticated before being executed. Alternatively, another policy might enable the authentication unit to authenticate code already stored in the cache.

Policy management inputs may for example be provided to authentication module 700 by dedicated hardware, by suitably secured firmware, or by suitably secured boot code running from an internal secured memory or from an on-die ROM/Flash.

Still referring to FIG. 2, design considerations, some or all of which may be employed, are now described; for clarity these are described with reference to a numerical example which of course is not intended to be limiting.

Example System Parameters:

Assume the external memory contains 1 Mbyte of content (code and/or data) to be used by the CPU. Assume the 1 Mbyte used by the processor is divided into 50, non-equal in size, authentication ranges, having an average size of 20 Kbytes. The size of the authentication ranges may be equal for simplicity or may be unequal. Range sizes may be selected e.g. dependent on some or all of the following considerations:

1. Security level:

-   -   a. Policy for non critical code may stipulate that this code can         be authenticated less frequently than critical code, and may be         executed by the CPU even if authentication has not occurred         particularly recently. Non-critical code may for example, as a         matter of policy, be periodically authenticated (e.g. once each         few hundred msec, each few seconds or each few minutes) whereas         critical code is more frequently authenticated e.g. before each         use thereof, or with a shorter period than the non-critical         code.     -   b. Critical code may be required by policy to be authenticated         just before, or no more than a predetermined time window before         being executed. Or, or critical code may be required by policy         to be authenticated just before (e.g. CPU executes code, or uses         data, as soon as the CPU is notified that authentication of that         code or data has been successfully completed), or no more than a         predetermined time window before, being used by the processor.         For example, the CPU 400 may execute the code, or use the data,         10 nsec-10 msec before execution.

2. Frequency of usage:

-   -   a. Code that is of low security level and is not frequently used         may be stored, in the same authentication range, with other code         of the same security level and may be authenticated         periodically, e.g. on the fly, by the authentication unit 700         when the system memory is not used by the cpu 400.     -   b. Code that is of high security level and is frequently used         may be stored separately in a relatively small authentication         range. This allows short and frequent authentication to occur,         with minimal effect over the system performance. For example if         the policy stipulates that authentication is a prerequisite of         (must occur before) execution, authentication of a small range         is faster than authentication of a long authentication range,         hence may have little, or no practical detrimental effect on         system performance.     -   c. Code that is of low security level and frequently used, may         be authenticated only once each predetermined time frame,         assuming policy stipulates that periodic authentication of the         low priority code suffices and there is no need to authenticate         before each execution.     -   d. For code that is high security level and infrequently used,         the block size may be larger since the penalty incurred by a         single authentication will not detract significantly from system         performance.

Design options for Signature generation and storage may include some or all of:

1. One signature for entire content (For the entire 1 Mbyte) signature, in this case there may be a single strong signature (for example RSA 2048) for the entire content used by the CPU. If for 1 Mbyte of code a 256 byte (2048 bits) of signature is stored, which is an overhead of 256 byte for 1 Mbyte of content, the authentication of 1 Mbyte of data may require about 500 ms using an RSA accelerator with 32 bit multiplier at 50 MHz. In this example the code is typically authenticated once, normally during boot. If the code is modified or corrupted after the authentication process, the system will typically not be alerted.

2. A small (e.g. weaker than the single signature of option 1) signature for each cache line or double word. For example, if the size of a typical cache line is 16 bytes; and if 4 bytes of signature are implemented per each cache line, the signature overhead would be 256K byte per 1 Mbyte of content, so the computation of a 4 byte signature will be fast. In this case, the signature is weaker, but the authentication is ongoing; each code that is fetched by the CPU 400 may be authenticated after being loaded to the main die, just before execution, although authentication is weak to the extent that the signature is small.

3. Continuing the example above, if each one of the 50 block is signed separately with a 128 bit signature (16 bytes), the 128 signature is much stronger than the 32 bit signature used in Option 2 above and the signature overhead is: 50×16 bytes=800 bytes. This overhead is much smaller than the 256 Kbyte in Option 2. Option 3 is also potentially faster, since only 16 bytes of signature are read for each 20 Kbytes, whereas for Option 2, assuming the example above, 5K will have to be read for each 20K byte of code or data. Compared to Option 1, the authentication process of option 3 is ongoing, therefore attacks are more difficult to launch after the initial boot, for option 3 relative to option 1.

According to certain embodiments, the authentication module, when reading memory, shares cache logic used by the CPU for memory read. FIG. 4 is a block diagram illustration of a fully associative cache integrating cached memory content authentication according to certain embodiments. When the CPU 400 performs instruction fetch or data read, switch 4 (601) is set to state 1. If the address matches one of the cache line addresses, a HIT indication is generated, switch 1 is at state 1 and the CPU 400 receives data directly from the cache. In this case the authentication module 700 may perform data read operations directly from the system memory to authenticate an address range, switch 2 (604) is at state 0, and switch 3 (602) is at state 0.

Typically, the authentication module 700 identifies, on the address input line, an address that matches the authenticated address. Then, in the event of a HIT (if the policy of an address range allows it or if allowed for any address), Switch 2 and switch 1 (604) are set to state 1 and the data from the cache memory reaches the authentication module (700) and may be used for authentication. Otherwise e.g. in case of a cache MISS, the address of the data read by the CPU matches the authenticated address, Switch 1 (604) is set to state 0 and the authentication module 700 may load the data for authentication.

When the CPU 400 is idle, Switch 4 (601) may be set to 0 and switch 3 (602) may also be set to state 0 to allow the on the fly (typically) authentication module 700 to use data recently loaded to the cache. In this case, for a HIT on the authenticated address, switch 1 is set to state 1 and the authentication module 700 sets switch 2 to state 1 in order to use data from the cache (in case of a cache HIT) for authentication. In case of a MISS, Switch 1 is set to state 0 and switch 2 is also set to state 0, to allow the authentication module 700 to use the data from the system memory for authentication.

According to certain embodiments, the system memory 100 may include a data storage portion (102) which may store data or code that is authenticated by a signature which may be located on the same memory component as the data or code, in signal storage portion (101), and/or on a different memory component. If the memory might be accessed by an attacker, the range signatures are typically encrypted. The data may also be encrypted, in which case the decryption engine and possibly also the encryption engine are typically implemented on the memory interface (200).

In FIG. 4 a cache control signal is typically used to control the bus peripheral signals when the authentication module 700 performs a read operation and/or in order to control switches 2, 3, 4 on the cache logic (601,602, 604). The cache status input may indicate cache HIT or cache MISS and/or input regarding the system operation status such as CPU idle.

The memory protection unit/s 800 provides a security level or “protection level” for each memory range which may indicate how often and/or under which circumstances to authenticate the range. A suitable policy may be defined by a system designer, or in firmware/software. The security policy input may represent a security configuration that may provide a security level for specific ranges, for an allowed period between authentication of a range or before executing from an address range. Security level may also set the authentication priority, such that a range with higher security level is, as aforesaid, authenticated before a range with lower security level. For embedded systems, typically including a processor/SOC (system on chip) and an external or stacked die memory, the security policy may be set by the system designer or software developer. Security typically includes authentication of code, thereby to yield secure code execution.

A stronger (higher) authentication level (e.g. signature length), such as an RSA signature of 256 bytes (typically performed after hashing e.g. using SHA1 or SHA2), may be used for authentication of the entire memory or for authentication of more than one range. Alternatively or in addition, each cache line may be authenticated or signed, typically using a shorter, weaker signature to limit signature overhead (for example 32 bit of signature for a cache line holding 128 bits of data). According to certain embodiments, an intermediate solution may be used such as for example 16 bits for 1K of code. This may be used to enable authentication of memory ranges typically between CPU operations, and/or during CPU idle and/or during consecutive CPU HIT events to prevent latency of the authentication process. For flexibility, authentication context switching may be employed to allow flexible authentication of memory ranges such that a relatively short authentication process ensures that a code about to run is authenticated before being fetched by the CPU 400.

It is appreciated that the apparatus of FIG. 4 may sometimes perform some or all of an authentication process at a time which is not during free time. For example, there may not be enough free time available to perform authentication mandated by a very strict security policy, in which case the apparatus may, for example, execute low security code, or use low security data, without authentication to generate enough free time for authentication mandated by the strict policy, or. Alternatively. the apparatus may halt processor execution and authenticate a code before execution.

FIG. 5 is a block diagram illustration of the authentication module 700 of FIGS. 2, 4. The authentication module 700 typically includes an authentication address context memory (701), which typically comprises a group of at least one register/s and may also be implemented using a memory. The registers typically store the next address for authentication for each authentication range (each of n authentication ranges, in the illustrated example). An authentication control (703) typically selects the address context value that will be driven on the address-out by controlling a multiplexer (704). The On the Fly (typically) authentication control (703) may load an initial address of a range to be authenticated and may increment the authentication address context after authentication data read. Once authentication data is present on the DATA-in lines, the Load Data signal loads the data into the Authentication engine (706) which may be implemented using SHA1/2 hardware in conjunction with conventional cyclic redundancy check (CRC) hardware, using a GFx field multiplier hardware or by a processor running a suitable authentication or hashing algorithm, such as but not limited to cyclic redundancy check (CRC), checksum, GFx multiplier SHA1, or SHA2.

In FIG. 5, the cache status is indicative of cache hits, e.g. occurrence of multiple cache hits. This in turn indicates that free time is available for the authentication engine to use (e.g. during cache HITs).

The authentication module 700 is typically configured to perform authentication context switching, using the authentication data context registers (702) or memory, e.g. by performing some or all of the following operations, suitably ordered e.g. as follows:

-   -   Operation a. If, during authentication of a specific range x, a         higher priority range authentication is needed, the         authentication control module 700 instructs the authentication         data context module (702) to load the authentication context         from the authentication engine via multiplexer (705).     -   Operation b. Once the context of range x has been loaded,         authentication of the higher priority range may commence. The         authentication control 703 selects the address context of the         higher priority range using multiplexer 704.     -   Operation c. Once the higher priority range authentication has         concluded, the authentication control module 700 may set the         Load-context signal of the authentication engine (706) and load         the authentication context of range x into the authentication         engine. Using multiplexer 704, the authentication control 703         selects the address context of range x.     -   Operation d. The authentication control module 700 may then         increment range x's authentication address context until the         entire range x has been read and hashed by the authentication         engine 706.     -   Operation e. Once the hashing process of range x is done, the On         the Fly (typically) authentication control module (703) may         perform a read of the range x signature from the signature         storage (101) in the system memory 100 (in the illustrated         example; alternatively the signature may be read from internal         memory).     -   Operation f. The range x signature may be loaded via multiplexer         (705) into the compare signature logic (707). If the signature         is valid (equal to the signature computed by the authentication         engine) the apparatus may continue operation without         interruptions.     -   Operation g. However, if the range x signature is invalid, the         authentication control block 703 may perform a pre-programmed         authentication-failed procedure. For example, block 703 may         provide a suitable attack warning indication e.g. to external         hardware implemented security logic. Typically, security systems         have security logic, typically hardware implemented, which may         be triggered by the attack warning indication to (say) stop         operation of the device and/or force the processor into a         certain state and/or force logging of the attack or any other         security event, e.g. using firmware and/or take any other         suitable countermeasure which may prevent a possible attack.     -   Operation h. A cache HIT indication may be delivered using the         cache status input.     -   Operation i. If the address indicated by Address-In is identical         to one of the context addresses in the registers, the HIT data         may be loaded into the authentication engine 706 together with         the relevant range authentication context (internal state of the         authentication engine 706) and this data may be authenticated.         In this case, the authentication engine 706 is typically         operative, on the fly, to increment the relevant range         authentication address context (701).     -   Operation j: This accelerates the authentication process and/or         reduces the required bandwidth for authentication. The operation         may be enabled or disabled by the security policy input and/or         by registers loaded with the security policy. These registers         may be general registers for the entire address range for a         group of authentication ranges or a separate register may be         provided for each authentication range. The security level         provided by the memory protection unit (MPU) (800) might affect         the security level, hence authentication priority, of an address         range.

Often, the compare logic (708) does not indicate a match, and the authentication engine 706 may use the CPU HIT duration, where data is fetched or read from the cache memory by the CPU 400. During the CPU HIT duration, the authentication module (700) may access the system memory and read data for authentication. The authentication engine 706 typically requests a data-read using suitable cache control signals such as the control signals of the switches and/or of the system bus such as address, select, data width; the read address is driven through the address-out interface. The data read for authentication is transferred through the Data-in interface directly into the authentication engine (706). In this case the Load Data signal coming from the authentication control module 703 loads data into the authentication engine 706 in order to read authentication data from the system memory, and to authenticate the system memory storage, e.g. as per conventional cyclic redundancy check (CRC) technology.

CPU execution prediction refers to prediction of the address of the next code or data to be fetched or read by the processor. For example if a certain operation is running, the I/O code thereof may be authenticated in advance. And/or, if the apparatus is going into its sleep state, the relevant code may be authenticated in advance using conventional methods of execution prediction such as that performed for conventional speculative execution.

For CPU execution prediction, the authentication address context (701) may also hold an address or a plurality (e.g. range) thereof, to be compared by the compare logic (708) with the address on the Address-In interface. An address match between the two addresses may result in assignment of higher authentication priority to the range whose address/es were compared to the address on the Address-In interface. This normally causes authentication of an address range to occur, before code stored at that address range is used by the CPU.

Referring now to FIG. 6, it is appreciated that according to certain embodiments, the authentication module predicts content about to be fetched to the cache and then performs authentication of an address range known to be associated with the content. The content fetched may comprise instructions (code) and/or data.

Prediction may take place on any suitable basis. For example, if the apparatus changes its state, the authentication engine may receive an indication of the state change (to power-up e.g.) and responsively, may authenticate the code (e.g. boot code) that may be expected (predicted) to run after this state change. If the apparatus is processing data which will subsequently be sent to an external device, the code which may be expected (predicted) to be used for sending the data may be authenticated in advance.

According to certain embodiments, the authentication engine stores a table of events (e.g. the example table of FIG. 6), each event being associated with an address of a code that is likely to run or be executed, upon occurrence of that event. Alternatively or in addition, a table of addresses may be loaded to the authentication engine (aka authentication module), each address being associated with a range that is likely to run for authentication.

The table of FIG. 6 may be used for execution prediction, and typically associates current (executed) code address with pointers to ranges which are probably about to be used, given the code currently executed. The comparison of the current executed address may be implemented as part of the compare logic 708 in FIG. 5.

According to certain embodiments:

-   -   1. The programmer sets the current running address 0x10200, for         example if this address generates a security key. The processor         will output this security key using a code located in the range         starting at address 0x76500. The authentication of the         respective range will start when the authentication engine         identifies a 0x10200 address; and/or     -   2. The programmer sets the current running address 0x4B300, for         example, if this address is associated with system sleep state         preparation. The processor will prepare sleep state using a code         located in the range starting at address 0x10400. The         authentication of the respective range will start when the         authentication engine identifies a 0x4B300 address; and/or     -   3. The programmer sets the current running address 0x17200, for         example, if this address is associated with system boot code.         The processor will perform reboot using a code located in the         range starting at address 0xD3200. The authentication of the         respective range will start when the authentication engine         identifies a 0x17200 address; and/or     -   4. The programmer sets the current running address 0x1A800, for         example, if this address is associated with reading data from         random number generator. The processor will randomize a key over         an elliptic curve using a code located in the range starting at         address 0xAE000. The authentication of the respective range will         start when the authentication engine identifies a 0x17200         address.

FIG. 7 is a simplified flowchart illustration of an authentication method configured in accordance with embodiments of the present invention, e.g. in conjunction with some or all of the apparatus shown and described with reference to FIGS. 1-5, 6. The method of FIG. 7 typically includes any subset of or all of the following operations, suitably ordered e.g. as follows:

-   -   Operation 710: provide suitable architecture e.g. some or all         of: external memory, cache memory processor, authentication         module, and switches (e.g. switches 602 and 604 of FIG. 4)         configured and positioned to enable the authentication module to         selectably access the external memory e.g. when the processor is         executing code from the internal cache memory.     -   Operation 715: load the code to the external memory in advance         or during boot time in a secured manner.     -   Operation 720: Installation or update of content e.g. code in         system memory including or not including conventional         authentication thereof     -   Operation 730: Split code/content into authentication ranges         equal or not equal in size, if the authentication ranges sizes         are not equal, the start address and end address of each range         (or range size) may be stored in a predetermined place on the         system memory.     -   Operation 740: Generate signature per chunk and store on the         system memory; the signature generation may be done in operation         710 when the installation or code update takes place.     -   Operation 750: Store association/correspondence between         locations of authentication ranges and of signatures, in system         memory, or use predetermined association/correspondence.     -   Operation 760: copy start address and end address/size into the         authentication unit before the authentication of a range is         initiated. If fixed size is known in advance, ranges need not be         used.     -   Operation 770: Execution of code/use of data including reading         code/data into cache from the memory, e.g. using any subset of         the operations in FIG. 8, suitably ordered e.g. as shown.

FIG. 8 is a simplified flowchart illustration of operations which may be performed when reading code/data into cache from memory. The method of FIG. 8 may include some or all of the following operations, suitably ordered e.g. as shown:

Operation 770 a: Authenticating each authentication range during free time by comparing computed signature to stored signature corresponding to each authentication range. To protect the signature from being attacked the signature may be encrypted and/or RSA or another public key cryptographic algorithm may be employed.

Operation 770 b: If code from a higher priority range is required, by virtue of the code's range policy, to be authenticated before execution and execution of the higher priority range's code is imminent, interrupt operation 770 a for the range currently being authenticated, store the context of the range currently being authenticated, perform authentication of the higher priority range, and then complete operation 770 a on the lower priority range, using the stored context. It is appreciated that this operation may occur recursively, if higher and higher priority codes or other content pile up and need to be authenticated at the cost of interrupting authentication of earlier encountered, but less highly prioritized, content. It is appreciated that the logic which determines whether to interrupt or not may optionally be omitted e.g. if it is appropriate to allow authentication to proceed in chronological e.g. first-in-first-out order.

Operation 770 c: If authentication fails, indicating that content e.g. code is un-authentic, perform suitable authentication-has-failed process e.g. some or all of: output a warning, start execution of fail-safe code, await external input before proceeding to execute the code whose authentication has failed.

According to certain embodiments, some or all of the following information may be stored on external memory:

i. Content—1M byte in size in the illustrative example used above. May be code or data (=non-code).

ii. Signature of each authentication range, 50 in the illustrative example used above.

iii. Address range of each range (stored e.g. if the ranges are not equal and cannot be computed.

iv. Range Policy.

All of the above information, e.g. all of information items i-iv above, may be authenticated before first use or before each usage.

The address range and policy may be copied into a secured location (the main die) during the first authentication and before being used for the first time. The address range and policy may be authenticated each time they are used.

Copying the address ranges into the Authentication Unit 700 is advantageous since this allows efficient memory range to policy and status mapping. The status/address range of the authentication ranges and the policy info may be held as part of the context, when interruptions occur.

Co-pending published U.S. Patent Application US2015263852 filed 23 Sep. 2014, describes systems and methods for secure storage on external memory. It is appreciated that any of the aspects of the invention described therein may be suitably combined with any of the aspects described herein.

It is appreciated that the authentication policy may take into account that content in the cache, once authenticated, is considered more secure, whereas content on the external system memory is typically considered less secure.

An advantage of certain embodiments is that content in the cache, once authenticated, is more secure than content on external system memory since reading information from a silicon die requires difficult-to-obtain knowledge regarding the die architecture and structure and sophisticated equipment with specialized operators; furthermore, even given these, it is difficult to wire more than a few signals, without destroying the die.

It is appreciated that terminology such as “mandatory”, “required”, “need” and “must” refer to implementation choices made within the context of a particular implementation or application described herewithin for clarity and are not intended to be limiting since in an alternative implementation, the same elements might be defined as not mandatory and not required or might even be eliminated altogether.

It is appreciated that software components of the present invention including programs and data may, if desired, be implemented in ROM (read only memory) form including CD-ROMs, EPROMs and EEPROMs, or may be stored in any other suitable typically non-transitory computer-readable medium such as but not limited to disks of various kinds, cards of various kinds and RAMs. Components described herein as software may, alternatively, be implemented wholly or partly in hardware and/or firmware, if desired, using conventional techniques, and vice-versa. Each module or component may be centralized in a single location or distributed over several locations.

Included in the scope of the present disclosure, inter alia, are electromagnetic signals in accordance with the description herein. These may carry computer-readable instructions for performing any or all of the operations of any of the methods shown and described herein, in any suitable order including simultaneous performance of suitable groups of operations as appropriate; machine-readable instructions for performing any or all of the operations of any of the methods shown and described herein, in any suitable order; program storage devices readable by machine, tangibly embodying a program of instructions executable by the machine to perform any or all of the operations of any of the methods shown and described herein, in any suitable order; a computer program product comprising a computer useable medium having computer readable program code, such as executable code, having embodied therein, and/or including computer readable program code for performing, any or all of the operations of any of the methods shown and described herein, in any suitable order; any technical effects brought about by any or all of the operations of any of the methods shown and described herein, when performed in any suitable order; any suitable apparatus or device or combination of such, programmed to perform, alone or in combination, any or all of the operations of any of the methods shown and described herein, in any suitable order; electronic devices each including at least one processor and/or cooperating input device and/or output device and operative to perform e.g. in software any operations shown and described herein; information storage devices or physical records, such as disks or hard drives, causing at least one computer or other device to be configured so as to carry out any or all of the operations of any of the methods shown and described herein, in any suitable order; at least one program pre-stored e.g. in memory or on an information network such as the Internet, before or after being downloaded, which embodies any or all of the operations of any of the methods shown and described herein, in any suitable order, and the method of uploading or downloading such, and a system including server/s and/or client/s for using such; at least one processor configured to perform any combination of the described operations or to execute any combination of the described modules; and hardware which performs any or all of the operations of any of the methods shown and described herein, in any suitable order, either alone or in conjunction with software. Any computer-readable or machine-readable media described herein is intended to include non-transitory computer- or machine-readable media.

Features of the present invention, including operations, which are described in the context of separate embodiments may also be provided in combination in a single embodiment. For example, a system embodiment is intended to include a corresponding process embodiment and vice versa. Also, each system embodiment is intended to include a server-centered “view” or client centered “view”, or “view” from any other node of the system, of the entire functionality of the system, computer-readable medium, apparatus, including only those functionalities performed at that server or client or node. Features may also be combined with features known in the art and particularly although not limited to those described in the Background section or in publications mentioned therein.

Conversely, features of the invention, including operations, which are described for brevity in the context of a single embodiment or in a certain order may be provided separately or in any suitable subcombination, including with features known in the art (particularly although not limited to those described in the Background section or in publications mentioned therein) or in a different order. “e.g.” is used herein in the sense of a specific example which is not intended to be limiting. Each method may comprise some or all of the operations illustrated or described, suitably ordered e.g. as illustrated or described herein. 

The invention claimed is:
 1. Apparatus for authentication of system memory content, the apparatus operative in conjunction with system memory storing content utilized by a central processing unit (CPU), the apparatus comprising: cache memory in data communication with the system memory; and authentication logic in communication with the system memory, wherein the authentication logic, when authenticating content, reads the content directly from the system memory, the authentication logic being operative to use free time over a memory interface defined between the system memory and the cache memory, for authentication of content stored in the system memory, including use of at least one signature stored on the system memory to authenticate the content, wherein the authentication logic at least once schedules content authentication over a time period T selected by the logic to coincide at least partly with a window of free time over the memory interface, and wherein, when the CPU performs at least one of instruction fetch and data read, a switch on a cache logic is set to state 1, connecting a cache address to the CPU, and wherein, if the address matches one of the cache memory's line addresses, a cache HIT indication is generated, a switch on the cache logic is at state 1, connecting code or data lines of the CPU to a cache output, a switch on the cache logic is at state 0, connecting the authentication logic with system memory data, and a switch on the cache logic is at state 0, connecting a system logic address with the authentication logic, and the CPU receives data directly from the cache, thereby to enable the authentication logic to perform data read operations directly from the system memory to authenticate an address range.
 2. The apparatus of claim 1 where the signature is stored encrypted, rather than as plain text, in system memory, thereby to deter a n attacker from replacing code so as to forge the signature.
 3. The apparatus according to claim 2 where the content stored in system memory is also encrypted.
 4. The apparatus according to claim 1, which supports plural authentication ranges each signed separately using separate signatures each having specific storage locations respectively, and wherein signature data context is stored, including a pointer to the address most recently accessed, for at least one range whose authentication was interrupted.
 5. The apparatus according to claim 1 wherein said authentication logic, rather than reading data directly from external memory, captures content fetched by the CPU from the cache memory and performs authentication on said content.
 6. The apparatus according to claim 1, wherein said authentication logic, rather than reading data directly from external memory, captures content fetched by the CPU from the system memory and performs authentication on said content.
 7. The apparatus according to claim 1, wherein said authentication logic performs at least one authentication operation when the CPU is idle.
 8. The apparatus according to claim 1, wherein the authentication logic receives at least one security indication from a system MPU, and acts upon the at least one security indication as determined by a pre-stored security policy.
 9. The apparatus according to claim 8, where the security policy is selectably determined, per code functionality, for the authenticated range.
 10. The apparatus according to claim 4, where at least a security level priority is selectably set, separately for each of a plurality of groups of ranges, each group comprising at least one range.
 11. The apparatus according to claim 4, where the authentication logic predicts content about to be fetched to the cache and then performs authentication of an address range known to be associated with said content.
 12. The apparatus according to claim 1, where the authentication logic, when reading memory, shares cache logic used by the CPU for memory read.
 13. The apparatus according to claim 1, wherein said content comprises code representing at least a portion of a procedure to be executed.
 14. The apparatus according to claim 1, wherein the authentication logic is configured to perform authentication context switching.
 15. The apparatus according to claim 1, wherein an authentication address context is used to hold a range of at least one address, which is compared by compare logic with an address on an Address-In interface and wherein, if an address match occurs, higher authentication priority is assigned to said range, whereas if the compare logic does not indicate an address match, an authentication engine uses a CPU HIT duration to access the system memory thereby to read data therefrom for authentication.
 16. The apparatus according to claim 13 wherein the procedure includes at least one high-criticality portion and at least one low-criticality portion and wherein the authentication ranges include at least pre-defined first and second authentication ranges such that the at least one high-criticality portion of the procedure is incorporated into the first authentication range and that the at least one low-criticality portion of the procedure is incorporated into the second authentication range, thereby to enable the authentication logic to apply a first more stringent authentication policy to the first authentication range and a second less stringent authentication policy to the second authentication range.
 17. The apparatus according to claim 13 wherein the procedure includes at least one frequently-executed portion and at least one infrequently-executed portion and wherein the authentication ranges include at least pre-defined first and second authentication ranges such that the at least one frequently-executed portion of the procedure is incorporated into the first authentication range and that the at least one infrequently-executed portion of the procedure is incorporated into the second authentication range, thereby to enable the authentication logic to authenticate the first authentication range more frequently than the second authentication range.
 18. An apparatus according to claim 1, wherein said authentication comprises on-the-fly authentication.
 19. The apparatus according to claim 1, wherein said authentication logic performs authentication operations only when the CPU is idle.
 20. The apparatus according to claim 7 wherein the authentication logic knows that the CPU is idle due to a system-provided signal indicating idle state.
 21. The apparatus according to claim 20 wherein the system-provided signal indicates at least one of the following idle states: sleep state, wait loop, clock stop.
 22. The apparatus according to claim 9 wherein the security policy stipulates that code used for internal operation/s has a lower authentication priority, relative to code used for output to an external device which enjoys a higher authentication priority.
 23. The apparatus according to claim 9 wherein the security policy stipulates that authentication of code must be performed before execution thereof.
 24. The apparatus according to claim 10 wherein at least one group comprises a plurality of ranges.
 25. The apparatus according to claim 1 wherein the system memory is external to at least one die on which at least the CPU and cache reside.
 26. The apparatus according to claim 1, wherein at least the CPU and cache, but not the system memory, are packed together in a single package.
 27. The apparatus according to claim 1, wherein the authentication logic comprises a purely hardware module.
 28. The apparatus according to claim 1, and also comprising a system memory protection unit (MPU) configured to send at least one security indication to the authentication logic to be acted upon in accordance with a security-indication dependent security policy.
 29. The apparatus according to claim 1, wherein the CPU and authentication logic reside on a CPU die, and the system memory is external to the CPU die.
 30. The apparatus according to claim 1 wherein the authentication logic comprises a processor.
 31. The apparatus according to claim 1, wherein the authentication logic includes logic operative, if possible, to schedule content authentication over a time period T selected by the logic to coincide at least partly with a window of free time over the memory interface.
 32. The apparatus according to claim 1, wherein the logic is configured to schedule content authentication over a time period T selected by the logic to be entirely included, if possible, within the window of free time over the memory interface.
 33. A method for authentication of system memory content which is operative in conjunction with system memory storing content utilized by a central processing unit (CPU), the method comprising: providing a cache memory in data communication with the system memory; and employing authentication logic, in data communication with the system memory, during free time over a memory interface defined between the system memory and the cache memory, for authentication of content stored in the system memory, wherein when the authentication logic authenticates content, the authentication logic reads the content directly from the system memory, and wherein, when the CPU performs at least one of instruction fetch and data read, a switch on a cache logic is set to state 1, connecting a cache address to the CPU, and wherein, if the address matches one of the cache memory's line addresses, a cache HIT indication is generated, a switch on the cache logic is at state 1, connecting code or data lines of the CPU to a cache output, a switch on the cache logic is at state 0, connecting the authentication logic with system memory data, and a switch on the cache logic is at state 0, connecting a system logic address with the authentication logic, and the CPU receives data directly from the cache, thereby to enable the authentication logic to perform data read operations directly from the system memory to authenticate an address range.
 34. The method according to claim 33 wherein the content is stored, for utilization by a CPU residing on a CPU die, on system memory external to the CPU die, the employing comprising: storing, on the system memory, the content, in a plurality of ranges each including plural cache lines; and a signature set including at least one signature and a policy set including at least one authentication policy, thereby to define an additional, “security” range within the system memory which holds the signature and policy sets; wherein each of the plurality of ranges is associated with a signature in said signature set and with an authentication policy in said policy set; and providing a hardware authentication module on the CPU die which is configured to authenticate said content by accessing said content, range by range, including verifying at least one individual range's signature according to the individual range's policy and wherein said accessing occurs upon at least one occasion defined by said range's policy.
 35. The method according to claim 34 wherein the hardware authentication module is configured to authenticate said content by accessing said content, range by range, including verifying at least one individual range's signature according to the individual range's security level which is stored as part of the individual range's policy.
 36. The method according to claim 34 wherein at least one individual range's policy is signed by the individual range's signature, thereby to prevent unauthorized policy change.
 37. The method according to claim 34 wherein at least one individual range's policy is signed by a signature other than said individual range's signature, thereby to prevent unauthorized policy change.
 38. The apparatus according to claim 1, where each authentication range is stored in association with at least one predefined range address.
 39. The apparatus according to claim 1 wherein the authentication logic identifies, on an address input line, an address that matches an authenticated address, wherein, in the event of a HIT, the first and second switches are set to state 1 and the data from the cache memory reaches the authentication logic for use in authentication and wherein otherwise, in case of a cache MISS in which an address of data read by the CPU matches an authenticated address, the first Switch is set to state 0 and the authentication logic loads the data for authentication.
 40. The apparatus according to claim 1 and wherein, when the CPU is idle, the third and fourth switches are set to 0 to allow the authentication logic to use data recently loaded to the cache memory and wherein, for a cache HIT on the authenticated address, the first and second switches are set to state 1 by the authentication logic to use data from the cache for authentication, whereas, for a cache MISS, switches 1 and 2 are set to state 0 to allow the authentication logic to use data from the system memory for authentication.
 41. The apparatus according to claim 1, wherein a cache control signal is used to control bus peripheral signals when the authentication logic performs a read operation.
 42. The apparatus according to claim 1, wherein a cache control signal is used to control the second, third and fourth switches on the cache logic.
 43. The apparatus according to claim 1, wherein a cache status signal provided by cache logic to the authentication logic indicates at least one of: cache HIT/cache MISS; and a CPU idle state.
 44. The apparatus according to claim 43 and wherein occurrence of multiple cache hits indicates that free time is available during cache HITs.
 45. The apparatus according to claim 1, wherein the authentication logic includes an authentication address context memory including a group of at least one register/s configured for storing next address for authentication for each of plural authentication ranges.
 46. The apparatus according to claim 1 wherein an authentication control selects an address context value to be driven on address-out by controlling a Multiplexer.
 47. The apparatus according to claim 46 wherein the authentication control is configured to load an initial address of a range to be authenticated and to increment an authentication address context after authentication data read.
 48. Apparatus according to claim 19 wherein the authentication logic knows that the CPU is idle due to a system-provided signal indicating idle state.
 49. The apparatus according to claim 1 wherein said cache memory and said authentication logic are on a single bus. 